All Questions
Tagged with neural-networkgradient-descent
123 questions
2votes
1answer
91views
Why are the second-order derivatives of a loss function nonzero when linear combinations are involved?
I'm working on implementing Newton's method to perform second-order gradient descent in a neural network and having trouble computing the second order derivatives. I understand that in practice, ...
1vote
1answer
102views
In a Computational Graph, how to calculate the total upstream gradient of a node with multiple upstreams?
Given a Computation Graph with a node (like the one below), I understand that I can use the upstream gradient dL/dz to calculate all of my downstream gradients. But what if there are multiple ...
1vote
2answers
399views
Gradient Descent: Is the magnitude in Gradient Vectors arbitrary?
I am only just getting familiar with gradient descent through learning logistic regression. I understand the directional component in the gradient vectors is correct information derived from the slope ...
2votes
1answer
544views
Gradients of lower layers of NN when gradient of an upper layer is 0?
Say we have a neural network with an input layer, a hidden layer and an output layer. Say the gradients with respect to the weights and biases of the output layer are all 0. Then, by backpropagation ...
1vote
1answer
76views
Doubt in gradient , vanishing gradient problem in Back propagation
As per my knowledge, in back propagation- loss function or gradient is used to update the weights. in back propagation, weights became small w.r.t gradients, this leads to vanishing gradient problem. ...
5votes
2answers
10kviews
What exactly is Gradient norm?
I found that there is no common resource and well defined definition for "Gradient norm", most search results are based on ML experts providing answers which involves gradient norm or papers ...
0votes
1answer
227views
Affine layer - gradient shape
In course cs231n, I need to implement backward pass computation for an affine (linear) layer: ...
0votes
1answer
127views
GAN Generator Backpropagation Gradient Shape Doesn't Match
In the TensorFlow example (https://www.tensorflow.org/tutorials/generative/dcgan#the_discriminator) the discriminator has a single output neuron (assume batch_size=1). Then over in the training loop ...
0votes
0answers
116views
Why backpropagation is done in every epoch when loss is always scalar?
I understand the backpropagation algorithm that it calculates the derivate of loss with respect to all the parameters in the neural network. My question is this derivate is constant right because the ...
2votes
0answers
142views
Can I find the input that maximises the output of a Neural Network?
So I trained a 2 layer Neural Network for a regression problem that takes $D$ features $(x_1,...,x_D)$ and outputs a real value $y$. With the model already trained (weights optimised, fixed), can I ...
0votes
0answers
657views
Proof that averaging weights is equal to averaging gradients (FedSGD vs FedAvg)
The first paper of Federated Learning "Communication-Efficient Learning of Deep Networks from Decentralized Data" presents FedSGD and FedAvg. In Federated Learning the learning task is ...
0votes
0answers
155views
calculating derivative of bias in backpropagation
Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...
2votes
1answer
874views
How does gradient descent avoid local minimums?
In Neural Networks and Deep Learning, the gradient descent algorithm is described as going on the opposite direction of the gradient. Link to place in book. What prevents this strategy from landing in ...
1vote
1answer
4kviews
how to calculate loss function?
i hope you are doing well , i want to ask a question regarding loss function in a neural network i know that the loss function is calculated for each data point in the training set , and then the ...
1vote
0answers
55views
How to interpret integrated gradients in an NLP toxic text classification use-case?
I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in ...